Skip to content

cuda.core.system: Add basic Nvlink and Utilization support#1918

Merged
mdboom merged 7 commits intoNVIDIA:mainfrom
mdboom:cuda-core-system-jupyterlab-nvdashboard
Apr 22, 2026
Merged

cuda.core.system: Add basic Nvlink and Utilization support#1918
mdboom merged 7 commits intoNVIDIA:mainfrom
mdboom:cuda-core-system-jupyterlab-nvdashboard

Conversation

@mdboom
Copy link
Copy Markdown
Contributor

@mdboom mdboom commented Apr 15, 2026

These APIs are needed by rapidsai/jupterlab-nvdashboard and rapidsai/rapids-cli

@mdboom mdboom self-assigned this Apr 15, 2026
@mdboom mdboom added the cuda.core Everything related to the cuda.core module label Apr 15, 2026
@mdboom mdboom added this to the cuda.core v1.0.0 milestone Apr 15, 2026
@github-actions

This comment has been minimized.

@mdboom mdboom force-pushed the cuda-core-system-jupyterlab-nvdashboard branch from ac86822 to 039013e Compare April 20, 2026 18:24
@mdboom mdboom requested a review from rparolin April 20, 2026 18:25
Copy link
Copy Markdown
Contributor

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated with the help of Cursor GPT-5.4 Extra High Fast

Manually verified.


Medium: Invalid NVLink indices are accepted and fail late

Device.nvlink() currently accepts negative or out-of-range link indices and
returns NvlinkInfo without validating them first. That differs from existing
indexed accessors such as Device.fan(), which validate eagerly. In practice,
device.nvlink(-1) constructs successfully and only fails later when a
property such as .version is accessed, which turns a basic argument error
into a delayed runtime failure.

Relevant paths:

  • cuda_core/cuda/core/system/_device.pyx:585
  • cuda_core/cuda/core/system/_device.pyx:683
  • cuda_core/cuda/core/system/_nvlink.pxi

Low: NvlinkInfo.version documents a non-existent return type

The public enum exported by cuda.core.system is NvlinkVersion, and the API
index plus tests use that spelling, but NvlinkInfo.version is annotated and
documented as NvLinkVersion. That leaks a wrong type name into the generated
help/doc output and points users at a symbol that does not exist.

Relevant paths:

  • cuda_core/cuda/core/system/_nvlink.pxi:21
  • cuda_core/docs/source/api.rst:225
  • cuda_core/tests/system/test_system_device.py:747

Low: NvlinkInfo.state has no direct test coverage

The new test_nvlink() checks construction of NvlinkInfo and accesses
.version, but it never reads .state. As a result, the wrapper path behind
NvlinkInfo.state has no direct coverage even on systems where the test does
not skip.

Relevant paths:

  • cuda_core/cuda/core/system/_nvlink.pxi:35
  • cuda_core/tests/system/test_system_device.py:734

@mdboom
Copy link
Copy Markdown
Contributor Author

mdboom commented Apr 21, 2026

Thanks for having your agent fight with my agent, @rwgk. ;)

@mdboom mdboom requested a review from rwgk April 21, 2026 15:59
nvml.device_get_nvlink_state(self._device._handle, self._link) == nvml.EnableState.FEATURE_ENABLED
)

max_links = nvml.NVLINK_MAX_LINKS
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cdef readonly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think class level variables already are:

>>> d.nvlink(0).max_links = 23
Traceback (most recent call last):
  File "<python-input-5>", line 1, in <module>
    d.nvlink(0).max_links = 23
    ^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'cuda.core.system._device.NvlinkInfo' object attribute 'max_links' is read-only

A cdef variable would not be available from Python.

@leofang leofang added the feature New feature or request label Apr 22, 2026
@mdboom mdboom enabled auto-merge (squash) April 22, 2026 14:52
@mdboom mdboom merged commit c747f7b into NVIDIA:main Apr 22, 2026
94 checks passed
@github-actions

This comment has been minimized.

1 similar comment
@github-actions
Copy link
Copy Markdown

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants